Skip to content

feat: Vllm url support v1#4010

Merged
zhanweizhang7 merged 1 commit intov2from
pr@v2@feat_vllm_url_support_v1
Sep 9, 2025
Merged

feat: Vllm url support v1#4010
zhanweizhang7 merged 1 commit intov2from
pr@v2@feat_vllm_url_support_v1

Conversation

@shaohuzhang1
Copy link
Copy Markdown
Contributor

feat: Vllm url support v1

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot bot commented Sep 9, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot bot commented Sep 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

raise Exception(_("Model does not exists or is not an LLM model"))

def process():
model = get_model_instance_by_model_workspace_id(model_id=model_id, workspace_id=workspace_id)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code snippet contains some minor issues and can be optimized:

Issues/Improvements:

  1. The if condition checks both existence of the model in the database (QuerySet(Model).filter(...).exists()) and its type (model.model_type == "LLM"). However, it should only check if the model exists before checking its type. This reduces unnecessary computations.

Optimization Suggestions:
2. Use a more efficient way to filter queries using dictionary unpacking where appropriate.
3. If you decide to keep the dual check (both existence and type), ensure that they are done sequentially. Checking the type after already knowing there's one could potentially lead to slower execution time because it still involves retrieving a record from the database even if it doesn't match the desired conditions.

Modified Code:

def generate_prompt(self, instance: dict, with_valid=True):
    q = prompt.replace("{userInput}", message)
    messages[-1]['content'] = q

    # Check if the model exists first
    model_existence_query = QuerySet(Model).filter(
        workspace_id=workspace_id,
        id=model_id
    ).exists()

    if not model_existence_query:
        raise Exception(_("Model does not exist"))

    # If we reach here, the model must exist now
    # You might want to add this line if necessary:
    model = get_model_instance_by_model_workspace_id(model_id=model_id, workspace_id=workspace_id)

    def process():
        pass

In the modified version, I removed the unnecessary check for model type after assuming the model exists based on its ID existence. Make sure to adjust logic according to your application requirements regarding handling such cases differently.


try:
client = OpenAI(
api_key=self.api_key,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet appears to be part of a class that checks authentication and performs speech-to-text conversion using the OpenAI API. Here are some observations, potential issues, and recommendations:

Observations

  1. Conditional Check: The code includes a conditional check to ensure the base URL ends with /v1.

  2. API Key Handling: There is no explicit error handling for invalid API keys.

  3. Logging/Output: No logging statements are visible, which could make it difficult to debug or monitor the execution flow.

  4. Thread Safety: If this method is used concurrently within multiple threads or processes, additional thread safety measures (like locking) might be required.

  5. Exception Handling in client Creation: It's not clear how exceptions will be handled when creating the OpenAI client instance. A try-except block should be implemented around the client creation line to catch and handle potential errors like network issues or incorrect API key formats.

  6. Code Length: While short, it lacks specific comments explaining each step, making it harder for other developers to understand.

Additional Recommendations

  1. Comments: Add detailed comments to explain the purpose of each section of the code. For example, describing what variables store what and why certain conditions are checked.

    # Construct the base URL ensuring it ends with /v1
    base_url = self.api_url if self.api_url.endswith('/v1') else f"{self.api_url}/v1"
    
    # Handle the case where self.api_url does not end with '/v1'

2. **Error Handling and Logging**: Implement comprehensive error handling for both file operations and API requests. Consider logging details such as timestamps, HTTP status codes, and exception messages for better diagnostics.

   ```python
   from openai import OpenAI, AuthenticationError, ApiConnectionError

   try:
       client = OpenAI(api_key=self.api_key)
       
       # Process response...
   
   except(AuthenticationError as e):
       print(f"Authentication failed: {e}")
   
   except(ApiConnectionError as e):
       print(f"Network problem encountered: {e}")

Additionally, you can write logs using Python's built-in logging module or a third-party library if needed.

  1. Concurrency Consideration: If this method might be called from multiple parts of your application simultaneously, consider implementing mutexes or locks to protect shared resources between different processes or threads.

  2. Documentation: Ensure that any functions or classes defined in this code have appropriate docstrings explaining their usage, parameters, and return values. This helps fellow developers quickly grasp the functionalities without running into ambiguity.

By addressing these points, you can enhance the robustness, maintainability, and debuggability of your code.

api_url=r_url,
params=model_kwargs,
**model_kwargs
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code for is_cache_model() and new_instance methods seems to be complete without identifying any significant irregularities or potential issues. However, I can suggest some optimizations:

  1. Variable Naming: For clarity and consistency, consider renaming r_url to something like _processed_api_url.
  2. Error Handling: Consider adding error handling to manage cases where 'api_url' might not be present in the model_credential dictionary.
  3. Parameter Consistency: Ensure that all dictionaries passed into the method have consistent keys.

Here's an optimized version of the code with these suggestions:

def is_cache_model():
    # Implement logic if needed

@staticmethod
def new_instance(model_type, model_name, model_credential: Dict[str, object], **model_kwargs):
    _processed_api_url = (
        model_credential.get('api_url')[:-4]
        if model_credential.get('api_url').endswith('/v1/')
        else model_credential.get('api_url', "")
    )

    try:
        return VllmBgeReranker(
            model=model_name,
            api_key=model_credential.get('api_key'),
            api_url=_processed_api_url,
            params=model_kwargs,
            **model_kwargs
        )
    except AttributeError as e:
        print(f"Error while setting up the model: {e}")

These improvements make the function more robust and easier to understand.

@zhanweizhang7 zhanweizhang7 merged commit d2463ab into v2 Sep 9, 2025
3 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_vllm_url_support_v1 branch September 9, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants